Characteristics of the Use of Coupled Hidden Markov Models for Audio-Visual Polish Speech Recognition

نویسنده

  • Mariusz Kubanek
چکیده

This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of highly disturbed audio speech signal. Recognition of audio-visual speech was based on combined hidden Markov models (CHMM). Described methods where developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audio-visual speech recognition. The problem of visual speech analysis is very difficult and computationally demanding, mostly because of extreme amount of data that needs to be processed. Therefore the method of audio-video speech recognition is used only while the audio-speech signal is exposed to considerable level of distortion. There were proposed authors' own methods of lip edges detection and visual characteristic extraction in this paper. Moreover there was proposed and tested the method of fusing speech characteristics for audio-video signal. A significant increase of recognition effectiveness and processing speed was noted during tests for properly selected CHMM parameters and adequate codebook size, besides the use of appropriate fusion of audio-visual characteristics. Experimental results were very promising and close to those achieved by leading scientists in the field of audio-visual speech recognition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bimodal speech recognition using coupled hidden Markov models

In this paper we present a bimodal speech recognition system in which the audio and visual modalities are modeled and integrated using coupled hidden Markov models (CHMMs). CHMMs are probabilistic inference graphs that have hidden Markov models as sub-graphs. Chains in the corresponding inference graph are coupled through matrices of conditional probabilities modeling temporal influences betwee...

متن کامل

Audio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models

In this paper we describe a speech processing system for Polish which utilizes both acoustic and visual features and is based on Dynamic Bayesian Network (DBN) models. Visual modality extracts information from speaker lip movements and is based alternatively on raw pixels and discrete cosine transform (DCT) or Active Appearance Model (AAM) features. Acoustic modality is enhanced by using two pa...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012